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Convergent Evolution During Local Adaptation to Patchy 

Landscapes 



Species often encounter, and adapt to, many patches of locally similar environmental conditions across 
their range. Such adaptation can occur through convergent evolution as different alleles arise and spread 
in different patches, or through the spread of alleles by migration acting to synchronize adaptation across 
the species. The tension between the two reflects the degree of constraint imposed on evolution by the 
underlying genetic architecture versus how effectively selection acts to inhibit the geographic spread of 
locally adapted alleles. This paper studies a model of the balance between these two routes to adaptation 
in continuous environments with patchy selection pressures. We address the following questions: How 
long does it take for a novel, locally adapted allele to appear in a patch of habitat where it is favored 
through mutation? Or, through migration from another, already adapted patch? Which is more likely 
to occur, as a function of distance between the patches? How can we tell which has occurred, i.e. 
what population genetic signal is left by the spread of migrant alleles and how long does this signal 
persist for? To answer these questions we decompose the migration-selection equilibrium surrounding an 
already adapted patch into families of migrant alleles, in particular treating those rare families that reach 
new patches as spatial branching processes. This provides a way to understand the role of geographic 
separation between patches in promoting convergent adaptation and the genomic signals it leaves behind. 
We illustrate these ideas using the convergent evolution of cryptic coloration in the rock pocket mouse, 
Chaetodipus intermedins, as an empirical example. 

Author Summary: Often, a large species range will include patches where the species differs 
because it has adapted to locally differing conditions. For instance, rock pocket mice are often found 
with a coat color that matches the rocks they live in, and the difference in coloration is known to 
be controlled genetically. Sometimes, similar genetic changes have occurred independently in different 
patches, suggesting that there were few accessible ways to evolve the locally adaptive form. However, 
the genetic basis could also be shared if rare migrants may carry the locally beneficial genotypes between 
nearby patches, despite being at a disadvantage between the patches. We use a mathematical model 
of random migration to determine how quickly adaptation is expected to occur through new mutation 
or migration from other patches, and study in more detail what we would expect successful migrations 
between patches to look like. The results are useful for determining whether similar adaptations in 
different locations are likely to have the same genetic basis or not, and more generally in understanding 
how species adapt to patchy, heterogeneous landscapes. 

Introduction 

The convergent evolution of similar phenotypes in response to shared selection pressures is a testament 
to the power of selection to repeatedly sculpt phenotypic variation. In some cases this convergence 
extends to the molecular level, with disparate taxa converging to the same phenotype through parallel 
genetic changes in the same pathway, genes, or even by precisely the same genetic changes [Stern, 2013, 
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Zhen et al., 2012, Martin and Orgogozo, 2013]. Convergent adaptation can also occur within species, if 
individuals within a species adapt to the same environment, in parallel, through different genetic changes. 
There are a growing number of examples of this in a range of well studied organisms and phenotypes 
[Arendt and Reznick, 2008]. Such evolution of convergent phenotypes is favored by higher mutational 
input, i.e. higher total mutational rate and/or population size [Pennings and Hermisson, 2006]. The 
geographic distribution of populations can also affect the probability of parallel mutation within a species: 
a widespread species is more likely to adapt by multiple, parallel mutations if dispersal is geographically 
limited, since subpopulations will adapt via new mutations before the adaptive allele arrives via migration 
[Ralph and Coop, 2010]. Standing variation for a trait can also increase the probability of convergence, as 
standing variation for an allele increases the probability that the selective sweep will be soft (beginning 
from a base of multiple copies), leading to genetic patterns similar to convergent adaptation [Orr and 
Betancourt, 2001, Hermisson and Pennings, 2005]. 

Intuitively, convergence is also more likely when geographically separated populations adapt to eco- 
logically similar conditions. The probability that convergent adaptations arise independently before 
adaptations spread between the populations by migration will be larger if these adaptive alleles are mal- 
adapted in intervening environments, since such adverse conditions can strongly restrict the spread of 
locally adapted alleles [Slatkin, 1973]. 

One elegant set of such examples is provided by the assortment of plant species that have repeatedly 
adapted to patches of soil with high concentrations of heavy metals (e.g. serpentine outcrops and mine 
tailings) [Kruckeberg, 1951, Turner et al., 2010, Macnair, 1991, Schat et al., 1996]; the alleles conferring 
heavy metal tolerance are thought to be locally restricted because they incur a cost off of these patches. 
Similarly, across the American Southwest, a variety of species of animals have developed locally adaptive 
cryptic coloration to particular substrates, e.g. dark rock outcrops or white sand dunes [Benson, 1933]. 
One of the best-known examples is the rock pocket mouse (Chaetodipus intermedins), which is on a 
number of black lava flows has much darker pelage than on intervening, much lighter soil [Dice and 
Blossom, 1937]. Strong predator-mediated selection appears to favour such crypsis [Kaufman, 1974], 
and, perhaps as a result of this strong selection against migrants, at least two distinct genetic changes 
are responsible for the dark pigmentation found on different outcrops [Hoekstra and Nachman, 2003]. 
Similar situations have been demonstrated in other small rodent systems [Steiner et al., 2009, Kingsley 
et al., 2009, Dice, 1940] and in lizards [Rosenblum et al., 2010]. 

In this paper, we study this situation, namely, when a set of alleles provide an adaptive benefit in 
geographically localized patches, separated by (inhabited) regions where the alleles are deleterious. The 
main questions are: Under what conditions is it likely that each patch adapts in parallel, through new 
mutation, and when is it likely that migration carries these alleles between patches? How easy will it be 
to detect adaptive alleles that are shared by migration, i.e. over what genomic scale will a population 
genetic signal be visible? 

We work in a model of continuous geography, using a variety of known results and new methods. In 
the section Establishment by Mutation we develop a simple approximation, equation (2), for the rate at 
which patches become established by new mutations. The most important conceptual work of the paper 
occurs in the section Establishment by Migration, where we develop an understanding of the process 
by which alleles move from an existing patch to colonize a novel patch, culminating in equation (12) 
for the rate at which this happens. We combine these two results in the section Probability of Parallel 
Adaptation to discuss to the probability of parallel adaptation, equation (14). To understand the genomic 
signal left by adaptations shared by migration, in the section Haplotypes Shared Between Patches, we 
approximate the time it will take for an allele to transit between patches, (equation (19)), and use this 
to approximate the length of haplotype that hitchhikes with it (equation (21)). We then quantify how 
quickly this haplotype decays within patches (equation (24)) and use this to obtain properties about the 
equilibrium distribution of haplotype lengths (equation (25)), in the section Haplotypes Shared Within 
a Patch. Finally, in the section Applications we apply this work to understand the geographic scale of 
convergent evolution in Chaetodipus intermedins. 
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Results 

Model of a patchy landscape 

Consider a population spread across a landscape to which it is generally well adapted, but within which 
are patches of habitat to which individuals are (initially) poorly adapted. (When we refer to "patches" 
it is to these pieces of poor habitat.) Suppose it takes only a single mutational change to create an allele 
(B) that adapts an individual to the poor habitat type. The required change could be as specific as a 
single base change, or as general as a knockout of a gene or one of a set of genes. We assume that this 
mutational change occurs at a (low) rate of /x per chromosome per generation, and that this initially rare 
new allele has fitness 1 + s p relative to the unmutated type (b) in these "poor" habitat patches. Outside 
of these patches we assume that the new allele has a relative fitness 1 — s m when rare in the intervening 
areas, with s p and s m both positive. (Here we define "fitness" to be the intrinsic growth rate of the type 
when rare.) To simplify matters, we assume that the disadvantage s m is sufficiently large that, on the 
relevant timescale, the allele is very unlikely to fix locally in these intervening regions. (The case where 
s m — 0 requires a different approach, which we do not treat here.) 

We are interested in the establishment of mutations in the "poor" patches by either migration or 
mutation, and so will focus on whether the allele can escape initial loss by drift when rare. Therefore, 
we need not specify the fitness of the homozygote; only that the dynamics of the allele when rare are 
determined by the fitness of the heterozygote. More general dominance will only make small corrections 
to the dynamics until initial fixation, with the exception of the recessive case, which we omit. In other 
words, we follow the literature in treating the diploid model as essentially haploid. 

Also, to avoid introducing more parameters, we assume population density p is equal across the range 
(even in the "poor" patches). If this is not the case, in most equations below, the population density that 
should be used is the density on the poor patches; see Lenormand [2002] for a more thorough approach. 
We further assume that the variance in offspring number is £ 2 , and that the mean squared distance 
between parent and child is a 2 (i.e. a is the dispersal distance). 

Past work on spatial models We will make use of several previous results from the literature on 
migration in models of spatially varying selection. Haldane [1948], Fisher [1950], and later Slatkin [1973] 
first analyzed deterministic models of of this sort. Nagylaki [1975] and Conley [1975] showed that in 
one dimension, if the physical width of the patch is less than (a/^/2s p ) tan -1 (^/ s m / s p ) , then there is no 
stable equilibrium with both b and B present, so that migrational swamping prevents B from establishing 
(see also Lenormand [2002] for a review). Barton [1987], using general theory of Pollak [1966], showed 
that patches must be at least this critical size so that a new mutation has positive probability of estab- 
lishment (in the large population density limit). The same paper also showed that this probability of 
establishment of a new mutation decays exponentially with distance from the patch, with the scale given 
by a I ' \J2s m . Mutations appearing within the patch have probability of establishment strictly less than 
the probability for a panmictic population, but approaching this as the size of the patch increases. This 
latter panmictic probability of establishment we denote p e , and often approximate p e by 2s p /£ 2 [Haldane, 
1927, Fisher, 1930]. This result holds quite generally for a geographically spread population that expe- 
riences a uniform selection pressure [Maruyama, 1970, Cherry and Wakeley, 2003]. More general work 
on the mathematically equivalent problem of density-dependent population growth has obtained much 
more general criteria under which a habitat configuration will support a stable polymorphic equilibrium 
[Cantrell and Cosner, 1989] , and determined the habitat configuration that maximizes the probability of 
establishment [Lou and Yanagida, 2006]. 

Establishment of a locally adaptive allele due to mutational influx 

Consider first a single, isolated poor habitat patch in which no B allele has yet become established. We 
first compute the time scale on which a new B mutations will appear and fix in such a patch, if the rate 
of mutation from 6 to B is /x per individual per generation. As we are interested in patches where local 
adaptation can occur, we will assume that our patch is larger than the cutoff for local establishment 
mentioned above. 

Let p(x) be the probability that a new mutant B allele that arises at location x relative to the center 
of the patch fixes within the poor habitat patch. Under various assumptions, precise expressions can 
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be found for p(x) [Barton, 1987], but results will be more interpretable if we proceed with a simple 
approximation. The total influx per generation of mutations that successfully establish is given by the 
integral of p(x) over the entire species range: 



and hence depends in a complicated way on the patch geometry and selection coefficients, but still scales 
linearly with the mutational influx density pp.. If we consider a patch of area A, whose width is large 
relative to a/^/2s m , then a reasonable approximation is to ignore migration, so that p{x) = p e ~ 2s p /£ 2 
within the patch, and p(x) = 0 otherwise. This approximates the integrand p(x) in (1) by a step function, 
which will be a good approximation if the patch is large relative the scale over which p{x) goes from 0 
to p e , or if p(x) is symmetric about the edge of the patch. We examine this more generally via exact 
calculation of p(x) in the section Probability of Establishment. 

The rate at which mutations arise and colonize a patch of area A is then 



i.e. the product of mutational influx in the patch (pAfi) and the probability of establishment (2s p /^ 2 ). If 
this rate is low, then the time (in generations) until a mutation arises that will become locally established 
within the patch is exponentially distributed with mean 1 / A mut . Assuming that once a mutation becomes 
established it quickly reaches its equilibrium frequency across the patch, the time scale on which new 
patches become colonized by the B allele from new mutation is therefore approximately 1/A mut . 

Simulation methods 

First we briefly describe the simulations used below for illustration and validation. We simulated forward- 
time dynamics of the number of alleles of each type in a regular grid of demes with fixed size N . Each 
generation, each individual independently chose to reproduce or not with a probability depending on her 
type and location in the grid; locally beneficial alleles were more likely to reproduce. Each then produced 
a random, Poisson distributed number of offspring that each independently migrated to demes no more 
than six steps away according to a symmetric migration kernel. Once migrants were distributed, each 
deme was uniformly resampled back down to N individuals. (Although we described the simulation in 
terms of individuals, we kept track only of total numbers in an equivalent way.) 

The reproduction rate in simulations for type b alleles was 30%; this was then multiplied by 1 + s to 
get the probability of reproduction for type B. These s are the values reported in the figures, and do 
not depend on the basic rate of reproduction. However, to obtain values for s p and s m when comparing 
theory to simulation, we computed the rate of intrinsic growth, i.e. the s so that the numbers of B alleles 
when rare would change by e st after t generations in the absence of migration. (The resulting values are 
close to the first notion of s, but give better agreement with theory, which uses the second definition.) 

To sample lineages, we first simulated forwards in time the population dynamics, then sampled 
lineages moving back through time by, in each generation, moving each lineage to a new deme with 
probability proportional to the reverse migration probability weighted by the number of B alleles in that 
deme in the previous time step. If more than one lineage was found in a deme with n B alleles, then 
each lineage picked a label uniformly from 1 . . . n, and those picking the same label coalesced. Since 
reproduction is Poisson, this correctly samples from the distribution of lineages given the population 
dynamics. 

Establishment of a locally adaptive allele due to migrational influx 

Now suppose that there are two patches, each of area A and separated by distance R. If the B allele 
has arisen and become established in the first patch, but has not yet appeared in the second, we would 
like to know the time scale on which copies of B move between patches by migration (supposing that 
no B allele arises independently by mutation in the meantime). To determine this, study the fine-scale 
genealogy of alleles that transit between patches, obtaining along the way other useful information about 
the genealogy of B alleles. 

Migration-selection balance ensures that there will be some number of B alleles present in the regions 
where they are disadvantageous, but only very few, and rarely, far away from the patch where B is 
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established. Denote the expected frequency of allele B at location x relative to the patch by q(x), and 
assume that the shape of the patch is at least roughly circular. Following Haldane [1948] or Slatkin 
[1973], one can write down a differential equation to which q(x) is the solution, and show that q(x) 
decays exponentially for large \x\, with a square-root correction in two dimensions: 

q(x) w C (\x\y/2s m /o) (d 1)/2 exp(— \x\\/2s m /a) for large | x , (3) 

where d is the dimension (d = 1 or 2), and C is a constant depending on the geographic shape of the pop- 
ulations and the selection coefficients. These asymptotics fit simulations quite well, as shown in Figure 1. 
To be clear about the assumptions (especially regarding scaling) implicit here, we provide a derivation 
in section Equilibrium Frequency, as well as justification for the asymptotics in section Asymptotics. In 
one dimension, the equation can be solved to give q(x) in terms of Jacobi elliptic functions; see appendix 
A. In applications we fit C to the data; to get concrete numbers from theory we take C = 1 if necessary. 
From simulations, C seems to be not too far from 1 in practice (see below). 




Figure 1. (A) Mean occupation frequencies of individual denies (grey dots) and averaged over denies at 
similar distances (black circles), on a one-dimensional grid of 201 demes, and a two-dimensional grid of 
101 x 101 demes. Superimposed in color is the decay predicted by (3), with C chosen arbitrarily to give a 
reasonable fit. (The disagreement at long distances is due to boundary effects near the edge of the range.) 
(B) Fluctuations: colored lines are snapshots of allele frequencies at evenly spaced times from a 
one-dimensional grid of 151 demes; the solid line shows the cumulative mean occupation frequency across 
all generations. For both plots, the simulation was run for 1000 generations; the focal allele is beneficial in 
the central 10 demes, with s p — 0.1, and deleterious elsewhere, with s m = —0.02; demes had 1000 
individuals in each; dispersal was to nearby demes with a mean dispersal distance of a sa 1.66 times the 
deme spacing. 

This expected frequency q(x) is the time-averaged occupation frequency, i.e. the total number of 
B alleles found across T generations near location x, per unit area, divided by T. If, for instance, 
q(x) — .01 and the population density is p — 10 individuals per unit area, then in a survey tract of 
unit area around x we expect to find one individual every 10 generations - or, perhaps more likely, 10 
individuals every 100 generations. This points to an important fact: close to the original patch, the 
"equilibrium frequency" q(x) describes well the density of the B allele at most times, but far away from 
the patch, the equilibrium is composed of rare excursions of families of B alleles, interspersed by long 
periods of absence. An example of one of these rare excursions is shown in Figure 2. The relevant time 
scale on which B alleles migrate between patches is given by the rate of appearance of such families. To 
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compute the rate of establishment by such migration, we argue that these families (suitably defined) can 
be approximated by independent branching processes whose spatial motion is described by Brownian 
motion. Properties of these, combined with equation (3), will lead us eventually to the expression (12) 
for the rate of effective migration, an approximation that is valid if the distance R is large (at least five 
times a/y/2s m , say). 




space (demes) 



Figure 2. A segment of the temporal dynamics of the same simulation as Figure IB. Space is again shown 
across the horizontal axis, with time on the vertical axis; demes are colored darker red the greater their 
allele frequency at the corresponding time. The 50% and 5% frequency contour lines are in grey, and the 
genealogy of 50 individuals in a rare long-distance excursion is traced back with black lines. 



To make the argument, we decompose the genealogy of B alleles into families in the following way. 
First, consider the genealogy of all B alleles that were alive at any time outside of the original patch. 
This is a collection of trees, each rooted at an allele living outside the patch whose parent was born 
inside the patch. Next, fix some intermediate distance ro from the established patch, and erase from the 
genealogy every allele that has no ancestors living further away than ro to the patch. This fragments 
the original set of trees into a collection of smaller trees that relate to each other all B alleles living 
outside of ro at any time, and some living inside of ro; we refer to these trees as "families". If ro is large 
enough that there are not too many families and interactions between family members are weak, then 
these families can be approximated by a collection of independent spatial branching processes whose 
members are ignored if they enter the original patch. (This can be made formal in the limit of large 
population density, also taking ro large enough that the chance of reaching the original patch is small.) 
This decomposition is illustrated in figure 3. In terms of these families, the equilibrium frequency above 
is then 

q(x) = ( outflux of families ) x ( mean occupation density of a family at x ) , (4) 

and the quantity we wish to compute, the effective rate at which families of migrant B alleles establish 
in a new patch at displacement x from the original, is 

A m ig(:r) = ( outflux of families ) x ( probability a family establishes in patch at x ) (5) 

The quantities on the right-hand side depend implicitly on ro. All terms except the "outflux of families" 
are calculations with spatial branching processes; and we can solve for this outflux using the established 
form for q(x) in (4). 
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The genealogy of migrant families 

To understand the process of adaptation by migration, we therefore need to look more closely at the rare 
families of B alleles far from the original patch where it is already established. Consider one such family, 
that lives far from the original patch. As noted above, we will model this as a branching process, i.e. 
by assuming that each individual migrates, then reproduces, independently of all other family members. 
Migration takes the individual in a random direction for a random distance whose mean square is a 2 , and 
if the individual arrives to a patch where the B allele is favored, we drop the individual from our family. 
Since the B allele is uniformly deleterious elsewhere, we can decouple reproduction and migration by 
first forming a tree in which each individual has a random number of offspring, then recursively assigning 
spatial locations to each individual in the tree, and finally pruning any branches that happen to wander 
into a patch. By our definition of fitness, the mean number of offspring per individual is e~ Sm « 1 — s m , 
and the variance is £ 2 . This pruning removes both branches leading back into the already established 
patch (which are irrelevant) and branches leading into the new patch (we will need to count these). Since 
we are concerned with the rare migrant families that transit quickly between patches, back migrants 
to the already established patch will contribute little, since they are going the wrong direction; so to 
simplify the analysis we ignore this detail. 

Since each individual has on average e~ Sm offspring, the expected family size after time t is e~ t3m . 
Therefore, if we define fc e (t) to be the chance that the family has gone extinct by time t, and let K t be 
the (random) family size conditioned on nonextinction, then by Bayes' rule, e~ Smt = (1 — k e (t))K[Kt\. 
It turns out that if the distribution of offspring numbers is not too heavy-tailed (see Jagers [1975] for 

details), then Kt has a limiting distribution: K t A K, and e _Sm '/(l — k e (t)) — > K[K] as t — > oo. In other 
words, the chance of survival to time t is a constant multiplied by e~ Smt , and conditional on survival, 
the family size is given by a fixed distribution. 

This suggests that families can be well-described by motion of a central "trunk", which dies at a 
random time whose distribution is given by k e (t), and conditional on survival of this trunk, the family 
is composed of side "branches" off of this trunk, as depicted in Figure 3. We can understand this quite 
concretely, by a method described rigorously by Geiger [1999] (see also Chauvin et al. [1991]). When 
we condition on survival until t, we condition on existence of at least one trunk lineage surviving from 
time 0 to time t (the long red lineage in Figure 3). Given this lineage, nothing distinguishes any of 
the other reproduction events - each individual along the trunk lineage must give birth to at least one 
offspring, and all other reproduction events are unconditioned. In other words, the genealogy of a family 
that survives until t can be decomposed into a trunk that lasts until t and that sprouts independent 
branches that may or may not survive until t. Specifically, if we write Zt for the number of offspring in 
the branching process alive at time t, then this gives the following decomposition: 

i W s -1 

(Z t \Z t >0)±l + J2 E Z £?> (6) 

s=0 fe=l 

where each Z^ a,k ^ is an independent copy of Z that arose from the main branch in generation s, and each 
W s is an independent copy of the offspring distribution, conditioned on W s > 1. 

This suggests that we add in the spatial motion by following only the motion of the "trunk" (i.e. the 
red line in figure 3). Its motion is approximately Brownian, with variance a 2 per generation. (Recall a 
is the dispersal distance.) We can use these two ingredients, Brownian motion and a branching process, 
to compute the terms of equations (4) and (5). 

The outflux of migrant alleles 

We are approximating the dynamics of the focal allele in the region further away than ro from the patch 
as the sum of independent migrant families, each of whose dynamics are given by a spatial branching 
process (as depicted in Figure 3). Call B(ro) the region closer than ro to the patch, and dB(ro) its 
boundary. Suppose that there is a constant outflux of migrant families from each point x £ dB(ro), such 
that the long-term proportion of individuals near a point x in dB(ro) that initiate new migrant families 
is 7(2;). To make expressions (4) and (5) more concrete, we need to replace "outflux of families" by a sum 
over points at distance ro from the patch of the outflux 7 at that point. By examining the constituent 
terms, it will turn out that equation (5) is well-approximated by a constant multiple of (4). 
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Figure 3. Cartoon of the decomposition described in the text. Migrants families that get further than rrj 
from the already-established patch (grey circle) are each composed of a "trunk" lineage (in red), whose 
spatial motion we follow, embellished by transient branches. For the depicted long-lived family, these are 
further distinguished by those still alive at time t (in black) and those that did not survive until t (in grey). 
Figure 2 has a depiction of such a family, from a simulation. Note that the spatial motions of lineages are 
independent random walks (and hence should be much wigglier than in this cartoon, which confounds 
space and time). 



First consider equation (4) for the equilibrium frequency. Suppose that Z is one such spatial branching 
process as above, started at time 0 with a single individual at x, and denote by Z t (A) is the number 
of descendant individuals alive at time t living in the spatial region A. The mean occupation density of 
Z at location y can be thought of as the expected total number of offspring of a family beginning at x 
that ever live at y. It is defined to be the density of E[J Q * Z t (-)dt], i.e. the smooth function u(x, y) that, 
integrated over a set of locations S, gives the expected total number of individuals ever living in S: 



Z t (S)dt 



s 



u(x,y)dy. (7) 



Since reproduction and spatial motion are independent, u(x,y) is the sum across all generations of the 
expected number of individuals alive in that generation, multiplied by the probability density that spatial 
motion takes a lineage from x to y, i.e. 

u(x,y) = e~ Smt p t {x,y)dt. (8) 
Jo 

The equilibrium frequency as decomposed in expression (4) is the sum of mean occupation densities of a 
constant outflux of branching processes from dB(ro): 



<l(y) = / i(x)u(x,y)dx. (9) 

JdB{r 0 ) 

This form we can now compare to the expression (5) for the outflux of successful migrants. Consider 
the probability that a migrant family beginning with a single individual at x will ever establish in the new 
patch. We assume this new patch is circular, has relatively small area A and is centered at position y. 
It would be possible to analyze this probability directly, as in Barton [1987]; but here we take a simpler 
route, approximating this by the chance that the trunk hits the new patch, multiplied by the chance that 
at least one member of the family escapes demographic stochasticity and successfully establishes in the 
new patch. Write h,A(x,y) for the probability that the Brownian trunk hits the new patch centered at 
y before the family dies out, and f(A) for the chance that the family manages to establish in the new 
patch, given that it successfully arrives. As for q(y) above, expression (5) is properly an integral against 
the outflux 7(3;): 



Amigfj/) = p / 1 (x)h A (x,y)f(A)dx. (10) 
ias(r 0 ) 

Intuitively, JiA{x,y) counts each family once if it hits the patch, while the occupation density u(x,y) 
counts the total amount of time that the family spends in the patch. Since the expected total time spent 
in the patch does not depend on the distance at which the family started if we know that it survived 
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to hit the patch, it is intuitively reasonable that u(x,y) and /ia(s,j/) are simply related. Following this 
intuition, and that (1 — k e (t)) as e~ Sm fE[K], it turns out (see section Hitting and Occupation) that 

where g(A) is the total expected occupation time of the patch given the family arrives. Recall that 
E[if] is the expected family size after a large number of generations, given survival. Since the integrands 
in expressions 9 and 10 only differ by a factor that (at least asymptotically) does not depend on the 
distance between x and y, we can obtain the migration rate by multiplying the equilibrium frequency 
by this factor. The probability f(A) that the family does not die out once it arrives at the new patch 
is approximately E[l — (1 — Pe) ] ~ 2spE[K]/^ 2 , if each member arriving has chance p e of establishing, 
independently. Combining with the fact that g(A) ~ l/(2s m ) (see section Hitting and Occupation, in 
particular equation (50)), we obtain that 



This does not depend on A, because although f(A) and g(A) in principle depend on the patch size (and 
geometry), the dependence is very weak. For instance, the width of a circular patch only very weakly 
affects the chance of establishment of a new migrant that appears on its edge, as long as the patch is large 
enough. For more discussion, including an expression for the outflux of migrants 7(2:), see section Hitting 
and Occupation. 



The probability of parallel adaptation between patches. 

We now turn to the question of whether the separated patches adapt by parallel genetic changes or by 
the spread of a migrant allele between the patches. As only a single mutation is required for individuals 
to adapt to the poor habitat patch, subsequent mutations that arise after an allele becomes established 
in the patch gain no selective benefit. Similarly, an allele introduced into a patch by migration will not be 
favored by selection to spread, if the patch has already been colonized. Therefore, mutations selectively 
exclude each other from patches, over short time scales, and will only slowly mix together over longer 
time scales by drift and migration, an assumption we also made in [Ralph and Coop, 2010]. In reality, 
different mutations will likely not be truly selectively equivalent, in which case this mixing occurs over a 
time-scale dictated by the difference in selection coefficients. 

We will assume that once a B allele is introduced (by migration or mutation) it becomes established in 
the poor habitat patch rapidly if it escapes loss by demographic stochasticity. Under this approximation, 
and the "selective exclusion" just discussed, after one of the patches becomes colonized by a single B 
allele, the other patch will become colonized by migration or mutation, but not by both. As such, the 
question of how the second patch adapts simply comes down to whether a new mutation or a migrant 
allele is the first to become established in the second patch. To work out how adaptation is likely to 
proceed, we can compare expressions (2) and (12) above for the effective rates of adaptation by new 
mutation and by migration. We work in one dimension, as the square root term appearing for two 
dimensions is relatively unimportant (see figure 1). 

An important point to notice is that effective migration and mutation rates will be on the same 
order if Ap « 2s m exp(— R^/2s m /o), where R is the distance between the patches, and A is the area 
of the not-yet-adapted patch. In other words, the migrational analogue of "mutational influx" (ppA) is 
2ps m exp(— Ry/2s m /a), which depends fairly strongly on the selective disadvantage s m between patches. 
Equivalently, the rates are roughly equal if R/o — log(2s m / (Ap)) / \/2s m , which gives the critical gap size 
past which adaptation will be mostly parallel in terms of selection coefficients, patch size, and mutation 
rate. 

If we take the mutational influx in a patch to be one per 10,000 generations (pAp — .0001) and the 
population density to be 1 (p = 1, so — log(yl^) « 9.2) then adaptation is mostly parallel for patches 
separated by gaps larger than R/a > (9.2 + log(2s m ))/\/2s m . If the selective pressure is fairly strong 
(say, Sm = -05), then for convergence to be likely, the distance between patches must be at least 21.8 
times the dispersal distance (i.e. R/a > 21.8). If s m is much smaller, say s m = .001, the required distance 
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increases to R/a > 67. On the other hand, if the separation is much smaller than this transitional value, 
then adaptations are likely to be shared between patches. 

If the mutational influx is higher - say, pAp = .01 - the required separation between patches is only 
reduced to R/a > 7.3 with s m = -05. If s m = .001, then with this higher mutational influx this calculation 
predicts that mutation will always be faster than migration - however, this is misleading, since with 
s m = .001, for the assumption of well-separated patches to hold we would need R/a > l/y/2s m = 22 A. 
The correct conclusion is that convergent adaptation is much more likely than shared adaptation between 
patches separated by at least 22.4cr; for shorter separations the model breaks down. 

One way to visualize this is by a phase diagram. In figure 4, we give a contour plot for the time until 
adaptation by migration (T m i g = 1/A m i g ) and by new mutation (T mu t = 1/A mut ) as a function of various 
parameters. This gives us an idea of not only the parameter space where parallel adaptation is likely 
(Tmig <C Tmut), but also where adaptation by any means takes a very long time (T m i g and Tmut both very 
large), an important check in applications of these results. 




Figure 4. Shaded contour plots of mean time until adaptation in a new patch occurs by migration (T m ; g , 
blue) or by mutation (T mut , red), as a function of population density (p), distance to an already adapted 
patch in units of dispersal distance (R/a), strength of deleterious selection (s m ), and mutation rate scaled 
by patch area (pA). The right-hand plot shows that A m i g is relatively insensitive to s rn (thanks to the 
square root). The black, dotted line shows values where T m - lg = T mut . Default parameter values were 
A = 100, p = 10, [a = 10~ 8 , s p = 0.1, s m = 0.01, R/a = 20, and £ 2 = 1. Note that population density p and 
mutation influx Ap are shown on a log scale. 



Taking these approximations at face value, and assuming both rates are small, the time until the first 
of the two patches is colonized by the B allele will be approximately exponentially distributed with mean 
l/(2A mut ). Following this, the time until the second patch is subsequently colonized (via either migration 
or new mutation) will be exponentially distributed with mean l/(A mu t + A m i g ). Both these rates scale 
linearly with population density (p) and the probability of establishment of a rare allele (p e w 2st/^ 2 ), 
so that the overall time to adaptation is robustly predicted to increase with offspring variance (C 2 ) and 
decrease with population density and beneficial selection coefficient. Furthermore, the probability that 
the second adaptation is a new mutation, i.e. the probability of parallel adaptation, is 

Amut _ Ap/(2s rn ) { ~\ A\ 

A m ut + A mig (i?) ~ Ap/(2s m ) + (RV2s~/a)- (d - 1)/2 exp (-^s^R/a) ' 
so that the probability of parallel mutation should increase approximately logistically with the distance 
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between the patches, at rate ^s m /a. 
Multiple patches 

Above we worked with two patches separated by distance R; but these calculations apply as well to 
multiple patches in which the B allele can facilitate local adaptation. Under our assumptions of selective 
exclusion and rapid establishment, each patch will adapt through establishment of a single copy of the B 
allele, either by migration or mutation. If we imagine that the times between successive establishments 
are the result of many nearly-independent attempts with small probability of success, the process of 
occupation is well approximated by a Markov chain, whose state records the labeling of patches according 
to the colonizing allele (or none, for those that haven't yet adapted). 

If not yet adapted, a focal patch of area A will acquire the adaptation by new mutation at rate 
2s p /j,pA/^ 2 (equation (2)). Suppose that patches 1, . . . , k are already adapted, and that these patches 
are at distances R\, . . . , R^ away from this unadapted patch. The total rate of adaptation through 
migration, from equation (12), is 

k k 

W*0 = C^ 8 ^ J2 (^^ / 2W^) ~ <d " 1)/2 exp (-RiV^n/v) . (15) 

i = l * i = l 

If the patch adapts through migration, the probability the adapted allele comes from patch i is 

( J R I )- (d - 1)/2 exp(^^V2^/a) 
E - =1 (i?0" (d " 1)/2 exp {-RiV^n/cr) ' 

Equations (15) and (16) specify the transition rates of the process. Since the compound parameter s p p/£ 2 
is common to both migration and mutation rates, it functions only as a time scaling of the process, and 
therefore has no effect on the final configuration, namely, the sets of patches sharing the genetic basis of 
their adaptive response. 

Also note that we can rescale time by the typical patch size, and introduce a parameter, say ak = 
Ak/A, making the properties (other than the time scaling) of the discrete model independent of the 
numerical sizes of the demes themselves. This is complementary to the results of Pennings and Hermisson 
[2006] , who showed that multiple mutations are likely to arise within a panmictic deme if the population- 
scaled mutation rate 2Np, is greater than 1. 

The resulting spatially discrete process is quite general, but if we oversimplify back to the island 
model, it is tractable: suppose the probability of adaptation by mutation is the same for each patch 
and the probability of migration is the same between each pair of patches. The resulting process is a 
continuous-time version of the "Chinese restaurant process" described in Aldous [1985] and Pitman [1995], 
and so the final partition of types across demes has the Ewens sampling distribution with parameter 
Afi/(4:S rn exp(—y/2s m R/a)). Now we can compute most properties we might want about the process. 
For instance, the proportion of demes that shares the same origin as a randomly sampled deme is 
approximately Beta distributed - see Donnelly and Joyce [1989]. The high connectedness of the discrete 
deme island model means that the expected number of distinct alleles grows with the log of the number 
of demes. This strongly contrasts with the continuous spatial model where the local nature of dispersal 
means that doubling the species range will double the number of mutations expected. 

Length of the hitchhiking haplotype 

If a patch adapts through new mutation or a rare migrant lineage, the genomic region surrounding the 
selected site will hitchhike along with it [Maynard Smith and Haigh, 1974], so that at least initially, all 
adapted individuals within a patch share a fairly long stretch of haplotype. Pairs of adapted individuals 
within a patch will share a haplotype of mean genetic length of about \og(pAs p ) / s p around the selected 
site, as long as the patch is reasonably well mixed by dispersal (otherwise see Barton et al. [2013]). This 
association gets slowly whittled down by recombination during interbreeding at the edge of the patch, 
but there will always be longer LD nearby to the selected site. 

When an already adapted patch colonizes another through migration, the newly colonized patch will 
inherit a long piece of haplotype around the selected site from the originating patch. The genetic length 
of this haplotype is roughly inversely proportional to the number of generations the allele spends "in 
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Figure 5. Adaptation by migration: An initially unadapted patch (10 demes at the left end of the 
range) is colonized around generation 5000; a representative lineage is shown in the new patch, which 
follows the original migrant "trunk" lineage between the patches and coalesces with a lineage in the 
unadapted patch. In the left panel, as in Figure 2, color and contours show allele frequency across space 
and time; other simulation parameters are as in Figures IB and 2. The right panel shows (in red) the 
extent of the shared haplotype between the two lineages around the selected locus (at position 0), which 
decreases as time progresses. 



transit", because while very rare, the allele will be mostly in heterozygotes, and so each generation 
provides another opportunity for a different haplotype to recombine closer to the transiting B allele. A 
large, linked haplotype may still arrive and hx in the new patch, in which case the haplotype has literally 
hitchhiked across geography! Figure 5 shows a simulation of such an event, including the lineage that 
founds an adaptive population on a second patch, and the length of the haplotype shared between this 
lineage and one in the original patch. The entire time that the lineage is outside the region where the 
B allele is common (dark red in Figure 5), the haplotype that accompanies it is broken down rapidly. 
After the lineage establishes on the patch, the rate of decay of the haplotype is slowed significantly, since 
most others with which it recombines have similar haplotypic backgrounds. We first discuss the transit 
of the lineage, and its accompanying haplotype, and return to how haplotypes are whittled down within 
established patches in the next section. 

Above we argued that a good model for this transit is a single Brownian trunk lineage surrounded 
by a cloud of close relatives of random size K whose chance of surviving until t generations is 1 — k e (t). 
Consider a single such family, and let r be the (random) time at which it first hits the new patch, with 
r = oo if it never does. We assume that the length of the hitchhiking segment depends only on the 
time spent by the trunk lineage of the first successful migrant "in transit" between the patches, i.e. r 
conditioned on r < oo. In so doing, we neglect recombination between B alleles (since they are at low 
density in transit), and the possibility that more than one successful migrant family is in transit at once 
(so that faster migrants would be more likely to have arrived first). These factors should have a minor 
impact if the rest of our framework fits. 

Our first obstacle is that we do not know 1 — k e {t), the probability a family survives until t, exactly. 
However, we do know that 1 — k e (t) ~ e~ Bm /K[K], i.e. that it has exponential tails. Since K[K] « l/s m , 
it is sensible to make one further approximation, supposing that the random lifetime of a family is 
exponentially distributed with mean l/s m . Once we do this, we can use standard results on hitting 
times of d-dimensional Brownian motion that is killed at rate s m (see Borodin and Salminen [2002] 
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2.2.0.1 and 4.2.0.1). In particular, if the patch is circular with radius w and lies at distance R from the 
already adapted patch, then we know that the Laplace transform of the transit time r is 

( e -R^/2(s m +t)/a d=l 
t K 0 {m^2(s m +t)/a) 

where Ko is a modified Bessel function of the second kind. We are interested in lineages that manage 
to reach the patch before being killed, i.e. having r < oo, which occurs with probability P{r < 00} = 
lim£_>o E [exp(— It)]. 

To keep the expressions simple, in the remainder of this section we only compute quantities for one 
dimension. By Bayes' rule, the Laplace transform of r for a successful lineage is 

E[e~ eT \r < 00] = exp j-^ (s/2{t + s m ) - y/2^j j • (18) 

This can be differentiated to find that the expected transit time is 

E[r] = {RV2^/a) x l/(2s m ) (19) 

and that Var[r] = (R^Z^/a) x l/(2s m ) 2 . 

Since each generation spent in transit provides an opportunity for recombination, if recombination 
is Poisson, the length of the haplotype (in Morgans) initially shared between populations on each side 
of the selected locus is exponentially distributed with mean r, so that if L is the length of hitchhiking 
segment on, say, the right of the selected locus, then 

P{L > £} = E[e~ iT \T < 00], (20) 

and so (one minus) equation (18) in fact gives the cumulative distribution function of L. The form of 
this distribution function implies that if Y is an exponential random variable with rate Ry/2/a, then L 
has the same distribution as (Y + ^/s^) 2 — s m . Furthermore, the expected length of shared hitchhiking 
haplotype is 

E[L] = aV2^/R + o-' 2 /R 2 (21) 

and Var[L] = 2s m a 2 / R 2 + 4a 3 \/2s m /R 3 + 5<r 4 /R 4 . For two dimensions, asymptotics of Bessel functions 
show that L has the same tail behavior, but how other properties might change is unclear. 

As a general rule of thumb, equation (19) means that families who successfully establish move at 
speed ^JlhnO towards the new patch - if s m , the strength of the selection against them, is smaller, the 
need to move quickly is less imperative. Then, equation (21) means that the haplotype length is roughly 
the length one would expect given the mean transit time. Note that we have become used to seeing 
a divided by \/2s m , rather than multiplied; it appears here because 0 /\/2s m is a length scale, and to 
convert it to a speed this is divided by l/2s m . This intuitive picture can be expanded by a central limit 
theorem: for large R, the deviation of r about its mean is asymptotically Gaussian, and the haplotype 
halfiength L is exponential. 



Length of the shared haplotype within a patch 

The other key issue in thinking about the population genetic signal of adaptation in spatial patches 
is the process by which haplotypes are whittled down within patches, which plays into a number of 
aspects of haplotype sharing. The initial haplotype that sweeps within a patch will be dispersed over 
time by recombination. Likewise, the haplotype that is shared between patches coadapted by migration 
(equation 21) will also break down. However, a long time after the initial sweep, we may still expect to find 
individuals within the patch sharing longer haplotypes around the selected locus than with individuals 
elsewhere, since selection against migrants decreases mean coalescence times within the patch near the 
selected locus. We first develop a simple approximation to the rate at which recombination breaks down 
haplotypes within patches. 

Refer again to Figure 5. When a lineage is in a region with high frequency of B alleles (dark red), 
it is very unlikely to recombine off the B background. However, when it wanders out of the patch, into 
regions with lower frequency of B alleles, when it recombines it will usually be on to the background of 
the b allele. 
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A detailed treatment of this is well beyond the scope of this paper, but we can get a good idea 
of the genomic scale of these effects through heuristic arguments. To understand the probability of a 
recombination moving our lineage off of the B background we need to understand what fraction of the 
time our lineage spends in geographic regions where b is common. We will follow the lineage of a single 
B allele in a patch at migration-selection balance, i.e. with the local proportion of B alleles near x equal 
to q(x), and suppose that the lineage stays close enough to the patch that fluctuations about q(x) are 
small. (Note that this assumption differs from previous sections, but here our focus is on a typical lineage 
in the patch rather than the rare lineages that make it far from the patch). If we suppose that type B 
individuals at location y reproduce at rate 1 + s(y), and the chance their offspring move to x is m(y, x) 
(with ^2 x m(y, x) = 1), then the total rate of influx of B alleles from y into x is pq(y)(l + s(y))m(y, x). As 
argued by Hudson and Kaplan [1988] , this implies that the chance that a B allele at x has been inherited 
from a parent at y in a time interval dt is equal to this influx divided by the local abundance of B 
alleles, i.e. dt x q(y)(l + s(y))m(y,x)/q(x). This therefore describes the movement of a lineage backwards 
in time. After the same large-population, continuous-space limit that underlies the rest of this paper 
(section Equilibrium Frequency), if the assumption that fluctuations in q(x) are small, this motion should 
converge to a diffusion process (as assumed by e.g. Hallatschek and Nelson [2008]), notwithstanding some 
technical difficulties (e.g. discontinuity of s(x)). This diffusion, when at x, is pulled toward regions with 
more B alleles by a mean displacement term (1 + s(x))-£^ \og(q(x)(l + s(x))), where s(x) = s p if x is in 
the patch and s(x) = s m otherwise, and moves at speed \Jl + s(x) (which is nearly constant, since s{x) 
is small). In ltd differential notation, the location Xt of a B lineage t generations ago satisfies 

dX t = (1 + s{Xt)) j- log(g(X t )(l + s{Xt)))dt + ay/1 + s(X t )dB t , (22) 

where Bt is a standard Brownian motion. Furthermore, it is easy to verify that the stationary distribution 
of this process has density 

_ g{xf{l + s(x)) 

( } ~ J v i(ym + s(y)W (U) 

A lineage at a locus at genetic distance R from a B allele moves as a B lineage until it recombines onto a 
b background, which occurs when at x with rate r(l — q(x))(l + s(x)/2). The Feynman-Kac formula could 
then be used to write down differential equations solved by various moments of the haplotype length as 
a function of the sampling location of the lineage; instead, we only use the above as a guide for rough 
approximation. 

Based on this picture of the lineage bouncing around in the patch, we propose a simple heuristic to 
describe the probability of recombination. If recombination is on a slower time scale than mixing within 
the patch, then the time until the last recombination is the integral of r(l — g(a;))(l + s(x)/2) against the 
stationary distribution (equation 23). Within the patch, q(x) « 1, so recombination off the B background 
is unlikely to occur, and so the rate of recombination depends to a first approximation on the proportion 
of the patch in which q(x) is intermediate, i.e. where both B and 6 are present. This region where q(x) 
is intermediate has width of order o/^/s, where s = min(s p ,s m ). As we increase the size of a patch, 
the area on which q(x) is intermediate scales with the perimeter of the patch. The per generation rate 
of recombination of a locus linked to the b background is thus reduced by a factor roughly equal to the 
proportion of the patch lying within distance <J I \fs of the boundary, so the effective recombination rate 



d= 1 



r c a = \ ^ Wv^/^+^aaT) _ (24) 

A+wa/y/Z^tr/i/Z+y/A/n) 

It follows that T generations after the founding of a patch we expect a haplotype of genetic length 1 / (r e ffT) 
Morgans to remain linked to the B allele in a typical lineage. Note that the length of haplotype decays 
at a rate roughly proportional to y/s/cr, as the narrower the cline the less opportunity for recombination 
to move our lineage off the B background. 

Finally we can consider the expected length of haplotype shared within a patch. To do this, we 
compare the rate of recombination between B and b haplotypes (equation 24) to the rate of coalescence 
of B haplotypes within a patch. We will assume that mixing within the patch is fast compared to 
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the rate of drift, so that the rate of coalescence of B alleles within a patch is £ 2 /(Ap), the "variance 
effective population size" of the patch (this bypasses the well-known difficulties to spatial coalescence 
[Felsenstein, 1975, Barton et al., 2002]). The rate of recombination and coalescence will be on the same 
order if £ 2 /(Ap) « r(a/y/s)A~ 1 ^ d . Hence the genetic length of the haplotype shared between pairs of 
individuals within the same patches at equilibrium is on the order of 

- ^ (25) 



vpA^-V/d' 



This does not depend on A in one dimension, but decreases with yA in two dimensions. In applications, 
this can be compared to equation (21) for the length of shared haplotype between patches. 



Applications 

Coat color in the rock pocket mouse Chaetodipus intermedins is a classic example of local adaptation 
[Benson, 1933, Dice and Blossom, 1937]. These mice live on rocky outcrops throughout the southwestern 
United States and northern Mexico, and generally have a light pelage similar to the predominant rock 
color. However, in some regions these mice live on darker substrates (e.g. old lava flows), and in these 
patches have much darker pigmentation, presumably to avoid visual predators. Nachman et al. [2003] 
demonstrated that on one of these flows (Pinacate), much of the change in coloration is due to an allele 
at the MC1R locus. This dark allele differs from the light allele by four amino acid changes, and has a 
dominant or partially dominant effect depending on the measure of coat color. These areas of dark rock 
are separated from each other by a range of distances, with some being separated from any other by 
hundreds of kilometers of light colored substrate. The Pinacate allele is not present in a number of other 
populations of dark colored rock pocket mouse, suggesting these populations have adapted in parallel 
[Nachman et al., 2003, Hoekstra and Nachman, 2003]. However, Hoekstra et al. [2005] reasoned that, 
elsewhere in the range, multiple close dark outcrops may share a dark phenotype whose genetic basis 
has been spread by migration despite intervening light habitat. We apply our theory to this example, in 
particular investigating how the probability of parallel mutation depends on the distance between dark 
outcrops. 

A key parameter above was the dispersal distance divided by the square root of strength of selection 
against the focal allele between patches, aj^Js m . Hoekstra et al. [2004] studied the frequency of the 
dark MC1R allele and coat color phenotypes, at sites across the (dark) Pinacate lava flow and at two 
nearby sites in light-colored rock. On the lava flow the dark MC1R allele is at 86% frequency, while at 
a site with light substrate 12km to the west (Tule), the frequency is 3%. The dark allele was entirely 
absent from Christmas pass, a site with light substrate 7km north of Tule, and 3km further from the 
lava flow. In the other direction, the dark MC1R allele was at 34% at a site with light substrate 10km 
to the east of the flow (O'Neill). These numbers give us a sense of a plausible range of characteristic 
lengths. Assuming the dark allele is at 50% frequency at the edge of the lava flow, we can fit formula 
(3) to these frequencies (similar to Hoekstra et al. [2004]). Doing this for Tule we obtain cr/^/s^ ~ 3km, 
and for O'Neill, a/y/s~^ ~ 30km, giving us a range of cline widths. We also need estimate of s m . Using 
a « 1km [French et al., 1968, Allred and Beck, 1963], these cline widths imply that s m — 1/9 and 1/900 
are reasonable values. 

The mutational target size p for the trait is unclear. While the Pinacate dark haplotype differs from 
the light haplotype at four amino acid residues, it is likely that not all of these changes are needed for a 
population to begin to adapt. Also there a number of genes besides MC1R at which adaptive changes 
affecting pigmentation have been identified in closely related species and more broadly across vertebrates 
[Hoekstra, 2006]. To span a range of plausible values, we use a low mutation rate of p = 10 -8 , where the 
adaptive mutation has to occur at a single base pair; and a high mutation rate p = 10~ 5 , equivalent to 
the mutation being able to arise at any of a thousand base pairs. Finally, we set A = 100km 2 (roughly the 
size of the Pinacate patch). In Figure 6 we show the dependence of the probability of parallel mutation 
on the distance between lava flow patches using these parameters, showing that parallel mutation should 
become likely over a scale of tens to a few hundred kilometers between patches. 

Given the large selection coefficient associated with the dark allele on the dark substrate, we expect 
the initial haplotype associated with either a new mutation or migrant allele to be large. We can use 
equation (21) to calculate how long the founding haplotype shared between populations is expected to 
be, as shown in Figure 6. The initial length can be quite long between geographically close patches (tens 
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Figure 6. Top panel: The probability of parallel mutation as a function of the distance between 
environmental patches for two different cline widths and two different mutation rates (using equation (14 
The parameters were chosen to match the example of Chaetodipus intermedins; see text. Bottom panel: 
The initial genetic length of the haplotype shared between patches due to adaptation via migration as a 
function of the distance between environmental patches. 



of kilometers). However, for the wider cline width (a/y/sm = 30km), adaptation by migration can still 
be likely for patches 100km apart, but the shared basis may be hard to detect, as the length of shared 
haplotype can be quite short. 

Furthermore, given the relatively small radius of the patches of dark substrate (often < 10km), we 
should expect the haplotype to be rapidly whittled down within patches after establishment (equation 
24). We estimate that the effective recombination rate is no less than 50% of the actual recombination 
rate. Indeed, if we take the population density to be roughly p ~ 10 per hectare (M. Nachman, personal 
communication), and the variance in offspring number to be £ 2 = 3 (mostly to get round numbers), 
then the shared haplotype length within patches from equation (25) is only expected to be on the 
order of hundreds or thousands of bases (10 _3 -10 _4 cM). This suggests that unless the migration has 
happened fairly recently then it will be difficult to identify shared haplotypes (how recently depends on 
the background level of LD in the population) . 



Discussion 

This paper is an investigation into the basic question: What is the spatial resolution of convergent local 
adaptation? In other words, over what spatial scale of environmental patchiness will the process of 
adaptation develop independent solutions to evolutionary problems? The answer to this depends most 
strongly on er/^/s m , the dispersal distance divided by the square root of the strength of selection against 
the allele between patches. It depends much more weakly on the selective benefit within the patches or 
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(perhaps surprisingly) the population density, although these two factors will determine the time-scale 
over which adaptation will occur. This is in contrast with models of panmictic populations [Hermisson 
and Pennings, 2005, Messer and Petrov, 2013, Wilson et al., 2014] and geographically spread populations 
adapting to homogeneous selection pressures [Ralph and Coop, 2010], where the probability of multiple, 
independently arising adaptive alleles increases with the population size/density. However, in all of these 
models the dependence on the beneficial selection coefficient is absent or weak, due to the fact that 
selection both aids establishment of new alleles and the spread of existing alleles (but see Wilson et al. 
[2014] for the complications of varying population sizes). 

We have also shown that while weaker selection against alleles will make sharing of adaptations 
between patches easier, it will also makes it harder to spot the sharing, since the lucky alleles that 
manage to colonize new patches move slower, and thus carry a shorter shared haplotype. This issue is 
amplified by the fact that the length of haplotype shared within patches decays over time, potentially 
making the identification of shared adaptations to old selection pressures difficult. 

Perhaps the most useful rule-of-thumb quantities we found were the following. The effective rate 
of migration into an as-yet-unadapted patch from an already-adapted patch distance R away - the 
analogue of the mutational influx \ipA - is 2ps m exp(—Ry/2s rn /a). Equivalently, the critical gap size 
between patches past which adaptation is likely independent is R = (a/\/2s m ) log(2s m /(A/i)). Finally, 
successfully transiting migrant lineages move at rate cr v / 2s m , and so shared haplotype lengths between 
patches will be of order i?/(a v / 2s m ). 

In developing the set of approximations in the paper we have ignored a number of complicating 
factors. We now briefly discuss these. 

Standing variation We have focused on relative rates of adaptation, since we are thinking of appli- 
cations in which we know that adaptation has occurred, and the question is whether or not adaptations in 
distinct patches have appeared independently or not. However, especially absolute if rates of adaptation 
by new mutation are very low, any adaptation that does occur may have to make use of standing varia- 
tion. The case of a panmictic population was studied by Hermisson and Pennings [2005], and we study 
the case of a continuous, spatial population in an upcoming paper. If parallelism in local adaptation of 
the sort we study here is due to standing variation rather than new mutation, then the dynamics of adap- 
tation should not depend strongly on migration patterns (but note that the initial spatial distribution of 
standing variation may). 

Dominance We have mostly ignored the issue of dominance by dealing with essentially haploid mod- 
els, and appealing to the fact that the dynamics we study occur where the mutation is rare, and hence 
mostly present only in heterozygotes. Our results should hold as a good approximation to dominant 
and partially dominant alleles (with s m the selection against heterozygotes). If, however, the mutation 
is recessive, then it is essentially neutral where rare, and so would encounter much less resistance to 
spreading between patches. The shape of the cline obtained is given by Haldane [1948]. This is counter- 
acted, however, by the increased difficulty with which the mutation would establish once arriving in a 
new patch. As such it is not clear what our intuition should be about the contribution of recessive alleles 
to adaptation via migration. Further work is needed to put empirical observations of local adaptation 
by recessive alleles in a theoretical context. It is simliarly unclear how the model should extend to a 
polygenic trait. 

Long distance migration We have also ignored the possibility of very long distance migration, 
instead focusing on local dispersal (and using a Gaussian model). However, dispersal distributions can 
be very heavy tailed, with a small fraction of individuals moving very long distances indeed [Levin et al., 
2003, Reynolds and Rhodes, 2009]. In addition, over long time-scales, very rare chance events (mice 
carried off by hurricanes and the like Censky et al. [1998], Nathan et al. [2008]) could play a role in 
spreading migrant alleles if adaptation by other means is sufficiently unlikely. Such tail events could 
greatly increase the probability of parallel adaptation above that predicted by our model. Furthermore, 
if adaptive alleles do move between distant patches via rare, long distance migration then they will be 
associated with a much longer shared haplotype than predicted by equation (21). As such, we view our 
results as a null model by which the contribution of long distribution migrants to adaptation could be 
empirically judged. 
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Other geographic models We have studied circular patches of habitat at long distances from each 
other. Real habitat geometry can be much more complex, e.g. with archipelagos of patches of varying 
sizes, or patches connected by long, skinny corridors, for instance. The work of Cantrell and Cosner 
[1991] comes closest to a general theory of balanced polymorphisms in such habitats. It is possible that 
our techniques could be applied in their much more general setting, as both are based, fundamentally, 
on branching process approximations. It is also interesting to think about the probability of convergent 
adaptation to continuously varying environments, e.g. replicated environmental clines. 

Concluding thoughts The falling cost of population genomic sequencing means that we will soon 
have the opportunity to study the interplay of adaptation with geography and ecology across many 
populations within a species. Our work suggests that even quite geographically close populations may 
be forced to locally adapt by repeated, convergent, de novo mutation when migration is geographically 
limited and selective pressures are divergent. Thus, systems where populations have been repeatedly 
subject to strong local selection pressures may offer the opportunity to study highly replicated convergent 
adaptation within a similar genetic background [Stern, 2013]. Such empirical work will also strongly 
inform our understanding of the ability of gene flow to keep ecologically similar populations evolving in 
concert [Sexton et al., 2013]. Our results suggest that adaptation to shared environments is certainly 
no guarantee of a shared genetic basis to adaptation, suggesting that rapid adaptation to a shared 
environment could potentially drive speciation if the alleles that spread in each population fail to work 
well together [Kondrashov, 2003]. 
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Materials & Methods 

Numerical calculation of the probability of establishment 

When rare, under at least some demographic models, copies of a new mutant allele are approximately 
independent and experience a uniform selective benefit; and can therefore be treated as a branching 
process (a spatial branching process in this case). Furthermore, whether or not a new, beneficial mutation 
establishes or is lost to demographic stochasticity is determined by this phase where it is rare. Fortunately, 
the probability that a branching process does not establish (i.e. dies out) can be found as a fixed point of 
the generating function of the process [Jagers, 1975]. Therefore, we calculated explicitly the generating 
function for a spatial branching process with nearest-neighbor migration on a one-dimensional lattice 
and a Poisson number of offspring with mean 1 + s, where s could vary by location, and iterated this 
forward to convergence to obtain 1 —p(x), the probability a single mutation appearing at x would fail to 
establish. We considered two situations: where a is a step function, and where it has a linear transition. 
These solutions are shown, and parameters described, in figure 7. 

Here (and at other parameter choices) we see that the probability of establishment p(x) goes to 
the equilibrium value (approximately p e = 2s/£ 2 ) within the patch; the transition is fairly symmetrical 
about the edge of the patch, even if the edge of the patch is not sharp. The fit is equally good for 
other parameter values, even if migration can move further than one deme and offspring numbers are 
not Poisson (results not shown). This lends credence to our approximation that the integral of p(x) over 
the entire range is close to p e multiplied by the area of the patch. 

The equilibrium frequency 

For completeness, and clarity as to the scalings on the relevant parameters, here we provide a derivation 
of the differential equations referred to above equation (3), and establish the asymptotics given in that 
equation. One route to the "equilibrium frequency" of the allele outside the range where it is advantageous 
is as follows; see Slatkin [1973] or Barton [1987], Pollak [1966] (or Kolmogorov et al. [1991] or Fisher 
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Figure 7. The probability of establishment of a new mutant allele as a function of distance from the edge 
of the region where it is beneficial, in an abrupt transition (left) and a gradual transition (right). The allele 
is deleterious to the left and beneficial to the right (with selection coefficients =f0.05 respectively); and the 
selection coefficient interpolates linearly between these in the central hatched region. The number of 
offspring is Poisson. The underlying landscape has demes every 15km and a migration rate of 0.5 between 
neighboring demes; the probability was found by numerically solving the equation for the probability of 
establishment of a multi-type branching process. Note that the approximation used in the text is here 
p e w 2s p /£ t 2 =0.1, agreeing with the asymptotic value on the right. 



[1937] or Haldane [1948]) for other arguments in equivalent models, or Etheridge [2000] and/or Dawson 
[1993] for general theory used below. Following is a description of a discrete model (which is nearly the 
one we simulate from). Suppose that the population is composed of a finite number of small demes of 
equal size N arranged in a regular grid, and that selection (for or against) the allele is given by the 
function s(x), with x denoting the spatial location. Each individual at location x reproduces after a 
random lifetime, dying, and then leaving behind in the same deme a random number of offspring with 
distribution given by X; each offspring then migrates independently to a new location chosen randomly 
from the distribution given by x + R, where she replaces a randomly chosen individual. If x + R is outside 
of the range, then she perishes. Each individual's time until reproduction is exponentially distributed: 
the reproduction rate is 1 if it carries the original allele, or with rate 1 + s(x) if it carries the mutant 
allele. Suppose that the number of offspring X has mean \i\ the variance of X will not enter into the 
formula (but assume X is well-behaved). Also suppose that the migration displacement R has mean zero 
and variance a 2 ; if we are in more than one dimension, we mean that the components of the dispersal 
distance are uncorrelated and each have variance a 2 . We also assume that the distribution of R makes 
the associated random walk irreducible and aperiodic. 

Let $f (x) be the proportion of mutant alleles present at location x at time t, and $>t(x) the process 
obtained by taking N — > oo (which we assume exists). Denote by 8 X a single unit at location x, so 
that e.g. <f>4 + 5 X /N is the configuration after a mutant allele has been added to location x. For 
0 < <t> < 1, we also denote by X$ the random number of mutant alleles added if X new offspring carrying 
mutant alleles replace randomly chosen individuals in a deme where the mutant allele is at frequency <f> 
(i.e. hypergeometric with parameters (X, cA)); similarly, X$ is the number lost if the new offspring do not 
carry the allele (i.e. hypergeometric with parameters (X, 1 — cf>)). (We like to think of as a measure, 
but it does not hurt to think of <f> N as a vector; we aren't providing the rigorous justification here.) Then 
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the above description implies that for any sufficiently nice function /($) that 
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In the final expectation, R and $ are independent. This follows by taking first-order terms in l/N in the 
Taylor series for /, and the fact that E[X</,] = 0/i and E[X^,] — (I — <j})^i. We can see two things from this: 
First, since this is a first-order differential operator, the limiting stochastic process $ obtained as N — > oo 
is in fact deterministic (check by applying to /(<&) = &(x) 2 to find the variance). Second, if we want to 
rescale space as well to get the usual differential equation, we need to choose Var[_R] = a 2 and s(x) to be 
of the same, small, order; this is another way of seeing that a I yfs is the relevant length scale (as noted 
by Slatkin [1973]). More concretely, suppose that the grid size is e — > 0, that Var[7i] = (ere) 2 , and that 
the strength of selection is s(x)e, and suppose that &t{x) is deterministic (and sufficiently differentiable) 
and let £(t,x) = $ t / £ (a;); then the previous equation with /($) = $(x) converges to the familiar form: 



wit, x) = p ( ~ d *M> x ) + s ( x )^ x )( l - *)) 



(28) 



Here we have taken first the population size N — > oo and then the grid size e — > 0; we could alternatively 
take both limits together, but not if e goes to zero too much faster than iV grows. One reason for this 
is that at finite N, the process $t is an irreducible, aperiodic finite-state Markov chain with absorbing 
states at 0 and 1; therefore, the inevitable outcome is extinction of one type or another, which is not the 
regime we want to study. 

In one dimension, we are done (see section A); in higher dimensions, we are more interested in the 
mean frequency at a given distance r from a patch. If we take a radially symmetric patch centered at 
the origin (so s only depends on r), and let 7") denote the mean occupation frequency at distance r, 
then the polar form of the Laplacian in d dimensions gives us that (28) is 



d 

-d 2 £(t,r) + a 2 



2r 



-a-£(t,r) + a(r)£(t,r)(l-£(t,r)) 



(29) 



Asymptotic solution for the equilibrium frequency 

If q(r) is a radially symmetric equilibrium frequency and £(t,x) = g(|a;|), then dt£(t,x) — 0. So, using 
(29), a radially symmetric solution with s(r) = — s < 0 for all r > rg solves, for ro < r < oo, 



d r q(r) + 



d- 



-d r q(r) 



2,s 



9(r)(l-g(r))=0 



lim q(r) = lim d r q(r) — 0 0 < q(r) < 1 



(30) 
(31) 



Since q(r) — > 0 as r — > oo, so q(r)(l — q(r)) ~ q(r), it can be shown that the true equilibrium frequency 
q is close, for large r, to the solution to 



d 2 u{r) + -d r u(r) 

r a 



%u{r) = 0 



lim u(r) = lim d r u(r) = 0 



(32) 
(33) 



This has general solution given by a modified Bessel function: using Gradshteyn and Ryzhik [2007] 
8.494.9, the general solution is 



u(r) = C'(r - r 0 ) <2 - d)/2 K (2 - d)/2 ((r - r 0 )V2s/a) 



(34) 
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where C' and ro are chosen to match boundary conditions. Asymptotics of Bessel functions (Gradshteyn 
and Ryzhik [2007], 8.451.6) then imply that 



q(r) « Cr (1 - d)/2 exp (-rV2s/<r) + 0(l/r), 



(35) 



where C is a different constant. 

Hitting and occupation 

Consider a single migrant family Z beginning at x, and write S for the new patch, which has area A and 
is centered at y. Let B t be a Brownian motion with variance a 2 , and t\ an independent Exponential(s m ) 
time. The mean occupation time of Z spent in a region S is, since the marginal distribution of a single 
lineage is Brownian, 



u{x,S) = E 



(36) 
(37) 
(38) 
(39) 
(40) 



where ¥ x gives probabilities for Brownian motion begun at x (i.e. Bo = x), and likewise ¥, x . If we define 
ts to be the hitting time of S by the Brownian motion B, and ns(%) to be the hitting distribution of dS 
by Bmin(t,T t )! by the strong Markov property this is equal to 



Z t {S)dt 
E[Z t ]pt{x,S)dt 
e Smt pt(x,S)dt 
¥ x {r t >tSzB t e S}dt 
ls(B t )dt 



u(x,S) =P"{r s < r t }E 



l s (B t )dt 



(41) 



where now E M denotes expectations for Brownian motion for which the distribution of Bo is fJ,, provided 
S is not pathological. If S is circular with area A, the latter expectation will not depend on x; call this 
9(A). 

Now consider the chance that Z ever hits the new patch, which we call h(x, S). (In the main body, 
this was Ha{x, y), here modified to parallel u(x, S).) If we make the approximation that Z hits the new 
patch only if the trunk of Z does, and recall that 1 — k e (t) is the chance that Z survives for t generations, 
then this is equal to 



h(x,S) w / (1 - k e (t))¥{r s e dt} 
Jo 

e Smt (l - k e {t))e~ amt ¥{T S e dt} 
e Sm \l - k e (t))¥{r s &dtkt<T t } 

mlo nTsedtkt<T ' } 

EM r[rs<rtl - 



(42) 
(43) 
(44) 
(45) 
(46) 



Therefore, we have that 



h(x,S)E[K] « r [t s < r t ] = u(x,S)/g(A). 



(47) 
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If we let q(S) = J s q(y)dy be the total equilibrium occupation of S, then by equations (9) and (10), 

q(S) = [ y(x)u(x,S)dx (48) 

JdB(r 0 ) 



^g(A)E[K] y(x)h(x,S)dx (49) 

JdB(r 0 ) 

9(A)E[K] 

= pj(^) Xmi s(y>- ( 50 ) 

The function g(A) is the expected amount of time that a Brownian motion begun on the edge of a disk of 
area A is expected to spend inside the disk before Tf. This is integral of the Green function for the Bessel 
process of the appropriate order, so using Borodin and Salminen [2002], and letting w be the width of 
the patch, in d — 1, 



g(A) = / dy (51) 

= (l-e- 2wVT ^)/(2s m ) (52) 
and in d = 2, by Gradshteyn and Ryzhik [2007] 5.56.2, 

g(A) = / 2yK 0 (yV2^)dy (53) 



yK 0 (y)dy (54) 
jo 

' (1 - 2wV2^K!(2wV2s^)) ■ (55) 



2s. 

In either case, g(A) ss l/(2s m ), which is the approximation we use in the main text. 

The outflux of families 

It is of interest to have an expression for the "outflux of families" , 7(ro) , defined to be the mean proportion 
of B alleles at distance ro whose lineage traces back to the original patch without venturing further away 
than ro. By expression (9), 

= f qi f W 

Js(r 0 ) u O> y> dx 

for any y. Take y — (r, 0). Note that after scaling by \j2s m jo, 

uv%y) „ . Mg 1 ^ 

q{y) ||j,||-(d-i)/2 e -iMi {00) 

= C\\x - x / r \\-^y\-^-/-W'^. (57) 
Now let x = (ro cos 9, ro sin 8), so that the expression in the exponent is to, first order in 1/r, 



r(||x-as/r|| - 1) = r \ J (l - — cos6>) 2 + r \ sin 2 9 - 1 j (58) 

= -2r 0 cos 0 + O(l /r). (59) 
This implies that, still with r, ro, and 9 defined as above, 

-^r^C'e- 2 ™™ 6 . (60) 

9(2/) 
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In one dimension we are done. In two dimensions, by bounded convergence and Gradshteyn and Ryzhik 
[2007] (3.339), 



L 



— ' dx — > 7(ro) as y — > oo (61) 

= C ( 2 \- 2rocose d6 (62) 
Jo 

= 2 7 rC"/ 0 (-2ro) I (63) 



where 7o is a modified Bessel function of order zero of the first kind. The effect of the scaling is to express 
ro in natural units, i.e. changing the result to 2nC'Io(— 2^/2s m ro/a). 
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A The exact solution to the equilibrium frequency 

Here we describe how to solve (30) exactly in one dimension, i.e. the Fisher-KPP equation with piecewise 
constant selection. Let p(t, x) be the proportion of the allele under spatially varying selection s(x) in one 
dimension, so that 

d t p(t, x) = ^cp-dipit, x) + ns(x)p(t, x)(l - p(t, x)). 

The stable distribution <f>(x) = lim t -+ oo p(t, x) then solves 

dU{x) = -2s(x)<l>{x){l - d>(x))/a 2 , (64) 

with appropriate boundary conditions. First rescale space by a/\/2 so the <r 2 /2 term disappears. If we 
now assume that s(x) is piecewise constant, s(x) = Si for x £ [xi,x i+ i), with xq = — oo and x n+ 2 = oo, 
then the equation is integrable: if we multiply through by 2d x <f>(x) and integrate, then we get that 



if we define 



(d x <f>(x)Y = - I <2,s{x)<t>{x){l-<t>{x))d x <j>{x)dx (65) 
= -Sj0(x) 2 ^1 - |^(a;)J - Ki for x € [xi, xt+i), (66) 



K, - - -,,(Xif ( 1 - ^4>(xi) ) - {d^ix^f. |(i7) 



For ease of reference, we define 



\ ',((.) = .s,.r (1-^)4 A', 



Note that U/(0) = U/(l) = 0, that U;(0) = K t and Vi(l) = Ki + Si/3. We will always have that V(4>) < 0. 
(We have then that d x (f> = — d,pV ((/)), the equation of motion of a particle in potential V. Also note that 
this implies "conservation of energy", i.e. (d x <f>(x)) 2 + V(4>(x)) is constant.) Rearranging, we get that 
dx = d(j) I \J '—V {(f)) , so where 4>(x) is monotone, the inverse is 



xu)=xu*)± r - di> . 
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For each i then define the elliptic function 

where take the positive branch of the square root, and 4>* will be chosen later. Then we have that 
x((f>) — x(tf>o) = ±(F(<j>) — F(<f>o)), or for an appropriate xq, 

<j } (x)=F- 1 (F(<f>(x 0 ))±(x~x 0 )). 

We clearly want lim^-^ d x <f>(±x) = 0, and lim^ _j.no (j>(±x) to be zero or one depending on the sign 
of so and s n +i- Since (d x 4>{x)) 2 = —V(x), this implies that if so < 0, then Ko = 0, while if so > 0 then 
Ko = so/6; and likewise for K n +i- 

We also require that <j>(x) and 4>'(x) are continuous. Continuity of <f>'(x) is equivalent to Vi(x i+1 ) = 
Vi+\(xi+\), which we can rearrange to find an equation for Ki+i in terms of Ki and <fi(xi+i): 

Ki+i - Ki = (si - s i+ i)(p(x 1+ i) 2 (l - 2<j)(x i+ i)/3). 

What about the frequency at the points the selection changes, 4>(xi)7 Well, if <f>(x) is monotone on 
[xi, Xi+i) then we can without loss of generality take <j>* = 0 or 1 depending on the sign of Sj. Otherwise, 
let (j)* be the (unique) root of V% in [xi, so Vi ((/)*) = 0. In this case, <f>* is the maximum or 

minimum of <j> hi the interval: if s > 0, then 4>i = max{0(a;) : x £ [xi, Xi+i)}. Recall we defined Fi 
using (j>*\ now using the fact that <f> is monotone with the opposite sign on either side of <f>*, Xi+i — Xi = 
Fi((j){xi+i)) — Fi{<j>l) +Fi(<j>(xi)) — Fi(4>*), and that F(<j>*) = 0, we know that the length of the ith stretch 
is 

x i+ i -Xi = ± (Fi((p(xi)) + Fi(4>(x i+ i))) . 

Note that all the ±'s are easily relatable to the signs of Si. If we knew 4>{xi) and 4>'{xi), then we'd be 
able to solve the equations for 4>{ x i) and Ki recursively upwards. In some cases, such as Slatkin [1973], 
we can infer (f)(0) and d x (f>(Q) by spatial symmetry. In other cases, we are only given <f>(— oo) and 4>(oo), 
and have to work inwards from the ends. 



A.l Doing the integrals 

An important ingredient in the above method is the integral (68), to which a method for solving the 
Korteweg-de Vries equation can be applied [NEQwiki, 2013]. Recall that Vi{4>) = s i( /> 2 (l - 2<£/3) + Ki, 
with Ki = Vi(0) chosen to match V at the boundaries; since we're just working within an interval with 
s constant, we can rescale space by y2si/a, so that s and a drop from the equation. We then want to 
integrate 

d<j> 



y/-V$) J y/^il- 20/3) + K' 

over a domain where V is always negative. Let </!> 2 (l — 2cj>/3) + K = (a — (j>)(4> ~ P){4> ~ l)-, an( i change 
variables first to y 2 = (a — <f>), and then to x = y/\/a — /3, so that 

d(j> —2dy 



V0 2 (l-20/3) + if ^( Q -/3-y2)( Q _ 7 _ y 2 ) 

-2dx 



v/o^rv^l - x2 )( 1 - S2x2 ) 
with S 2 — (a — f3)/(ot — 7). Now Jacobi's incomplete elliptic integral of the first kind is defined by 

F(x;k)= f — dt 

and the Jacobian elliptic function sn(a:; k) is the inverse: F(sn(x; k);k) — x. As k — > 0, sn(a;; k) — > sin(a;), 
while as k — > 1, sn(a;; k) — > sinh(a;). 



